Attend and Interact: Higher-Order Object Interactions for Video Understanding
نویسندگان
چکیده
Human actions often involve complex interactions across several inter-related objects in the scene. However, existing approaches to fine-grained video understanding or visual relationship detection often rely on single object representation or pairwise object relationships. Furthermore, learning interactions across multiple objects in hundreds of frames for video is computationally infeasible and performance may suffer since a large combinatorial space has to be modeled. In this paper, we propose to efficiently learn higher-order interactions between arbitrary subgroups of objects for fine-grained video understanding. We demonstrate the impact of modeling object interactions towards significantly improving accuracy for both action recognition and video captioning, while saving more than 3-times the computation over traditional pairwise relationships. The proposed method is validated on two large-scale datasets: Kinetics and ActivityNet Captions. Our SINet and SINet-Caption achieve state-of-the-art performances on both datasets respectively, even though the videos are sampled at a maximum of 1 FPS. To the best of our knowledge, this is the first work modeling object interactions on open domain large-scale video datasets, and we additionally model higher-order object interactions which proved to further improve the performance with low computational cost.
منابع مشابه
اثرات متقابل ژن- ماده مغذی در بروز سرطان؛ یک مطالعه مروری سیستماتیک
--Advances in molecular biology over the past decades have contributed to a profound understanding of the function of genes in the development of diseases. The environment and nutritional factors interact with the genetic background of subject results in development of various diseases including cancer, cardiovascular disease and degenerative nervous disorders. However, the exact mechanisms o...
متن کاملGrounded Objects and Interactions for Video Captioning
We address the problem of video captioning by grounding language generation on object interactions in the video. Existing work mostly focuses on overall scene understanding with often limited or no emphasis on object interactions to address the problem of video understanding. In this paper, we propose SINet-Caption that learns to generate captions grounded over higher-order interactions between...
متن کاملThe Role of Mosques in Social Interactions between the neighbors (Case Study: Shahshahan Neighborhood of Isfahan)
Problem expression: The manifestation of modern thinking in urban life and, consequently, the spread of the culture of individualism and civic indifference in recent years, has faced problems for urban dwellers such as isolationism and reduced social interactions. Therefore, peoplechr('39')s need for places for social interactions and psychological needs has become one of the necessities of urb...
متن کاملA Conversation Analytic Study on the Teachers’ Management of Understanding-Check Question Sequences in EFL Classrooms
Teacher questions are claimed to be constitutive of classroom interaction because of their crucial role both in the construction of knowledge and the organization of classroom proceedings (Dalton Puffer, 2007). Most of previous research on teachers’ questions mainly focused on identifying and discovering different question types believed to be helpful in creating the opportunities for learners’...
متن کاملروشی جدید برای اختفای خطا در فریمهای ویدئو با استفاده از شبکه عصبی RBF
Transmission of compressed video over error prone channels may result in packet losses, which can degrade the image quality. Error concealment (EC) is an effective approach to reduce the degradation caused by the missed information. The conventional temporal EC techniques are always inefficient when the motions of the video object are irregular. In this paper, in order to overcome this problem,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1711.06330 شماره
صفحات -
تاریخ انتشار 2017